Targeted Gene Metagenomic Data Analysis ◾ 271
qiime demux emp-single \
--i-seqs artifacts/multiplexed-emp-single-end.qza \
--m-barcodes-file data/sample-metadata.tsv \
--m-barcodes-column barcode-sequence \
--o-per-sample-sequences artifacts/demux-single-end.qza \
--o-error-correction-details artifacts/demux-details.gza
It is also required to provide the sample metadata file as an input “m-barcodes-file” and the
column which includes the barcode sequence in the metadata to be specified. The output is
two artifacts: an artifact for the demultiplexed reads and an artifact for the demultiplexing
details.
Figure 7.3 shows the commands and usages for demultiplexing both file formats.
Once the raw data is imported into QIIME2 artifact and the multiplexed reads were
demultiplexed, all types of raw data will be preprocessed and analyzed in the same way.
Therefore, we will discuss the remaining steps of the analysis with QIIME2 through a
worked example.
7.3.3 Downloading and Preparing the Example Data
As an example, we will download sequence raw data from the NCBI SRA database. The
data is for amplicon-based 16S rRNA gene sequences obtained from NGS for a study to
examine the effect of a yoga-based intervention against a low-FODMAP diet on patients
with irritable bowel syndrome. FODMAP is an acronym for Fermentable Oligosaccharides,
Disaccharides, Monosaccharides, and Polyols, which are short-chain carbohydrates and
poorly absorbed in the small intestine. The metagenomic 16S rRNA gene data sequenced
from fecal samples are available for 86 patients, with irritable bowel syndrome, grouped
into (i) patients who received yoga sessions and (ii) patients who received low-FODMAP
diet. The NCBI BioProject accession for this study is PRJEB24421. We will download the
FASTQ files from the NCBI SRA database and then we will follow through the QIIME2
pipeline to analyze these data step by step. The data are for demultiplexed paired-end
sequences: two FASTQ files (forward and reverse) for each sample.
7.3.3.1 Downloading the Raw Data
To download the files of all experiments, we need to obtain the run accessions of the
experiments in the BioProject. To keep files organized, we will create a directory to store
the raw data of this project. Open the Linux terminal and create a directory with the
BioProject accession, and inside that directory, create a subdirectory with the name “data”
as follows:
mkdir PRJEB24421
cd PRJEB24421
mkdir data
In the next step, we will save the run accessions of the BioProject in a text file in the
“data” subdirectory. To do that, you can open the NCBI SRA database and search for the